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Abstract 

(k, s)-SAT is the satisfiability problem restricted to instances where each clause has exactly k 
literals and every variable occurs at most s times. It is known that there exists a function / 
such that for s < f(k) all (k, s)-SAT instances are satisfiable, but (k,f(k) + 1)-SAT is already 
NP-complete (k > 3). The best known lower and upper bounds on f(k) are Sl(2 k /k) and 



0(2 k /k a ), where a 
to a log k factor. 



log 3 4-Isj 0.26. We prove that f(k) = 0(2 k • log k/k), which is tight up 



1 Introduction 



We consider CNF formulas represented as sets of clauses, where each clause is a set of literals. A 
literal is either a variable or a negated variable. Let k, s be fixed positive integers. We denote by 
{k, s)-CNF the set of formulas F where every clause of F has exactly k literals and each variable 
occurs in at most s clauses of F. We denote the sets of satisfiable and unsatisfiable formulas by 
SAT and UNSAT, respectively. 

It was observed by Tovey |7j that all formulas in (3, 3)-CNF are satisfiable, and that the satisfiability 
problem restricted to (3, 4)-CNF is already NP-complete. This was generalized in Kratochvfl, et 
al. [I] where it is shown that for every k > 3 there is some integer s = f(k) such that 

1. all formulas in (k, s)-CNF are satisfiable, and 

2. (k, s + 1)-SAT, the SAT problem restricted to (k, s + 1)-CNF, is already NP-complete. 



The function / can be defined for k > 1 by the equation 

f(k) := max{ s : (k, s)-CNF n UNSAT = }. 

Exact values of f{k) are only known for k < 4. It is easy to verify that /(l) = 1 and / (2) = 2. It 
follows from [7j that /(3) = 3 and f{k) > k in general. Also, by jSJ, we know that /(4) = 4. 
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Upper and lower bounds for f(k), k = 5, . . . ,9, have been obtained in [21 H3 E El- For larger values 
of k, the best known lower bound, a consequence of Lovasz Local Lemma, is due to Kratochvfl et 
al. @]: 



(i) 



The best known upper bound, due to Savicky and Sgall [5], is given by 



/(*) £ o (£) 



(2) 



where a = log 3 4 — 1 ~ 0.26. 

In this paper we asymptotically improve upon P)). and show 



, W = o(*£*) 



(3) 



Our result reduces the gap between the upper and lower bounds to a log k factor. It turns out that 
the construction yielding the upper bound (j3J) can be generalized. We present a class of fc-CNF 
formulas, that is amenable to an exhaustive search using dynamic programming. This enables us 
to calculate upper bounds on f(k) for values up to k = 20000 improving upon the bounds provided 
by the constructions underlying (J2J) and ©• 

The remainder of the paper is organized as follows. In Section[2J we start with a simple construction 
that already provides an 0(2 k log 2 k/k) upper bound on f(k). In Section|H]we refine our construc- 
tion and obtain the upper bound (J3J). In the last section we describe the more general construction 
and the results obtained using computerized search. 

2 The first construction 

We denote by JC(xi, . . . , Xk) the complete unsatisfiable /c-CNF formula on the variables x\, . . . , x^. 
This formula consists of all 2 k possible clauses. Let K~{x\, . . . ,xt) = JC(x\, . . . , Xk)\{{xi, . . . ,£&}}. 
The only satisfying assignment for JC~(xi, . . . ,xp.) is the all- False-assignment. Also, for two CNF 
formulas and Fi on disjoint sets of variables, define their product F\ x Fi as \c\ U C2 : c\ G 
F\ and C2 G F%}. Note that the satisfying assignments for F\ x F2 are assignments that satisfy F\ 
or F<}. In what follows, all logarithms are to the base of 2. 



Lemma 1. f(k) < 2 k ■ mini<K fc ((1 - 2~ l ) L fc /'J + 2~ l ) . 

Proof. We prove the lemma by constructing an unsatisfiable (k, s)-CNF formula F where s = 
2 k ■ ((1 - 2"') Lfc//J + 2^0- Let k > 1 be two integers such that 1 < I < k, and let u = [k/l\ and 
v = k — I ■ u. Define the formula F as the union F = Fq U F\ U . . . U F u , where: 



F 



k,(zi,. . . ,z v ) x JJa: (. 




i=l 



Fi = 



IC(y? , . . . J^) x {{x 



1 ' 



. . . , x\ }} for i = 1, . . . ,u. 
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Therefore, F is a A;-CNF formula with n variables and m clauses, where 

n = k + u-{k-l)<l + k 2 /l, (4) 
to = 2 V ■ (2 l - l) u + u-2 k ~ l = 2 k - ((l-2 -, ) Lfc/,J + -2 _/ ) . (5) 

To see that F is unsatisfiable observe that any assignment satisfying Fq must set all the variables 
Xi , . . . ,xf"' to False for some i. On the other hand, any satisfying assignment to F^ must set at 
least one of the variables x^\ . . . , x ) to True. 

(i) (i) 

To bound the number of occurrences of a variable note that the variables Zj,yj , and Xj occur 
\F \,\Fi\, and \F \ + \Ft\ times, respectively. Since \F Q \ = 2 V ■ (2 l - l) u = 2 k ■ (1 - 2 — I )L*/'J and 
\Fi\ = 2 k ~ l , we get the required result. 

□ 

For k > 4, let I be the largest integer satisfying 2 l < k ■ log e/ log 2 k. If follows that 

(1 - 2-')L*/«J < exp(-2-' • [k/l\) < exp(-^ • & + 1)) 

feloge I 



1 , log 2 fc 1 log A; 1 

< "Tr-expl-T— — ) < ^=-exp(-— — ) 



fe. I log e yfe log e k^fe ' 

where the last inequality follows from the fact that I < log k for k > 4. Therefore, by Lemma ^ 
there exists an unsatisfiable /c-CNF formula F where the number of occurrences of variables is 
bounded by 

2*.(— + ^)- 
ky/e k log e 

It may be of interest that by (J3J) and the number of clauses in F is 0(2 fc - log fc) and the number 
of variables is 0(k 2 / log k). 

Corollary 2. f(k) = 0(2 k ■ log 2 k/k). 



3 A better upper bound 

To simplify the subsequent discussion, let us fix a value of k. We will only be concerned with CNF 
formulas F that have clauses of size at most k. We call a clause of size less that k an incomplete 
clause and denote F' = {c S F : |c| < k}. A clause of size k is a complete clause, and we denote 
F" = {ceF: \c\ = k}. 

Lemma 3. f(k) < min{2 fe ~ i+1 : I G {0, . . . , k} and I ■ 2 l < log e • (k - 21)}. 

Proof. Let I be in {0, ...,k}, satisfying I ■ 2 l < loge • (k — 21), and set s = 2 k ~ l+1 . We will 
define a sequence of CNF formulas, Fq, . . . We require that (i) Fj is unsatisfiable, (ii) Fj is a 
(k — I + j)-CNF formula, (iii) \F'-\ < 2 fe_ ', and that (iv) the maximal number of occurrences of a 
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variable in Fj is bounded by s. It follows that Fi is an unsatisflable (k, s)-CNF formula, implying 
the claimed upper bound. 

Set kj = k — I + j and Uj = [(k — I + j) / (I — j + 1)J . We proceed by induction on j. For j = 0, we 
define Fq = K.{x\, . . . , Xk-i)- It can be easily verified that Fq satisfies the above four requirements. 
For j > 0, assume a formula on the variables y±,...,y n , satisfying the requirements. We 

define the formula Fj = Ur=o ^i>* as follows: 

Uj 

F j)0 = K(zi, . . . .^-wj-p-j+i)) x JJ/C"(xf } , . . . ,xf} j+l ), (6) 

i=l 

F ifl = F^ 1 (yf\...,y^)x{{xf\...,xfl j+ J}UF^ 1 (yf\...,y^) for/ 1 u r (7) 

It is easy to check that F'- is a (A; — / + j)-CNF formula. To see that Fj is unsatisflable, observe that 
any assignment satisfying Fj^, must set all the variables xf\ . . . , to False for some i On 

(i) (i) 

the other hand, for any satisfying assignment to Fjj, at least one of the variables x\ , . . . , x\_- +1 
must be set to True. 

Let us consider the number of occurrences of a variable in Fj. Consider first the y- variables. These 
variables occur only in the Uj duplicates of Fj—i and therefore occur the same number of times as 
in Fj—i, which is bounded by s by induction. The number of occurrences of an x- or z-variable is 
\Fj-x\ + l-^j'.ol or l-^i,o| respectively. By induction, \F'j_ x \ < 2 k ~ l . Also, 

\Fj\ = \Fj \ = 2 kj ~ Uj '( l ~ j+1 ^> ■ (2 l ~ j+1 - l) Uj = 2 kj ■ (1 - 2~ l+ i~ 1 ) Uj 

< 2 k ~ l+j • exp(-2- /+J '- 1 • uj) < 2 k - [+j • exp(-2- /+J '- 1 • (k - 2l)/l). 

Taking logarithms, we get 

log|F ji0 | < k-l+j-\oge-2- l+j - 1 -(k-2l)/l 
< k-l + j- 2 j ~ l <k-l. 

Therefore, Fi is an unsatisflable (k, s)-CNF formula for s = 2 k ~ l+1 , as long as 

l-2 l <loge-(Jfe-2i). (8) 

□ 

Let I be the largest integer satisfying 2 l < loge • fe/(21og k). Then Q holds for k > 2 and therefore 
we get the following: 

Corollary 4. f(k) < 2 k ■ 81og e k/k fork>2. 

4 Even better upper bounds 

One way to derive better upper bounds on f(k) is to generalize the construction of Section |SJ To 
this end, we first define a special way to compose CNF formulas capturing the essence of that 
construction. 
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Definition 5. Let Fi,F 2 be unsatisfiable CNF formulas that have clauses of size at most k such 
that F- is a fc,-CNF formula for i = 1,2. Also, assume that k\ < k<i < k. Then the formula F± o F2 
is defined as: 



where the formulas F\ tC are copies of F\ on distinct sets of variables. 
It is not difficult to verify the following: 

Lemma 6. Let Fi,F 2 be formulas as above, where the number of occurrences of a variable is 
bounded by s > (2 k ~ k2 - I) ■ \F{\ + \F^\ and let G = F 1 o F 2 . Then G is an unsatisfiable CNF 
formula where each variable occurs at most s times. Furthermore, G' is a (k\ + k — k 2 )-CN¥ 
formula, and \G'\ = (2 k ~ k * - 1) • \F[\. 

Given k, s, we ask whether we can obtain a /c-CNF formula using the following derivation rules. We 
start with the unsatisfiable formula {0} as an axiom (this formula consists of one empty clause). 
For a set of derivable formulas, one can apply one of the following rules: 

1. If F is a derived formula such that s > 2 • \F'\, then we can derive F' x {{x}, {x}} U F" , where 
x is a new variable. 

2. If F\,F 2 are two derived formulas satisfying the conditions of Lemma H3 then we can derive 
the formula F\ o F 2 . 

Note 7. One can sometimes replace F\ o F2 in the second rule by a more compact formula 
F\ o' F2 that avoids duplicating F\. Namely, the formula F[ x K~(xi, . . . ,x k _ k2 ) U F" U F2 x 
{{xi, . . . ,Xfc_fc 2 }} U F% . Although this can never reduce the number of occurrences of variables, 
this modification reduces the number of clauses and variables. In the construction of Section^ we 
always use o' instead of o . 

Since any fc-CNF formula obtained using the above procedure is an unsatisfiable (k, s)-CNF, one 
can define f2(k) as the maximal value of s such that no /c-CNF formula can be obtained using the 
above procedure (clearly f(k) < f'2(k)). It turns out that the function f2(k) is appealing from an 
algorithmic point of view. Given a value for s, one can check if f2(k) < s using a simple dynamic 
programming algorithm. For all I = 0, . . . , k — 1, the algorithm keeps as state the minimal size of 
F' for a derivable unsatisfiable formula F where F' is an Z-CNF formula. This approach yields an 
algorithm that works well in practice and we were able to calculate f 2 {k) for values up to k = 20000 
to get the results depicted by the graph in Figure ^ 

The computed numerical values of f% (k) seem to indicates that 



which is better than our upper bound by a constant factor of about 11. If (jHJ) indeed holds, then 
a better analysis of the function f 2 may improve our upper bound by a constant factor. However, 
such an approach cannot improve upon the logarithmic gap left between the known upper and 
lower bounds on f(k). 




Fi c x c U F" c UF^x {{xi, . . . , x k . k2 }} U Fi 



-,// 
2 j 



f2(k) ■ k/2 k = 0.51og(fc) + o(log(fc)) 



0) 
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Figure 1: The bounds on f(k)-k/2 k . (a) Lower bound of Kratochvfl et al. [1], 1/e. (b) Upper bound 
(j3J) obtained in Section 3 of the present paper, 81og e k. (c) Upper bound f2(k) ■ k/2 k , calculated 
by a computer program, (d) The line 0.51og(/c) + 0.23. 
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